Comparing Two Approaches for Adding Feature Ranking to Sampled Ensemble Learning for Software Quality Estimation

نویسندگان

Kehan Gao

Taghi M. Khoshgoftaar

Amri Napolitano

چکیده

High dimensionality and class imbalance are two main problems that affect the quality of training datasets in software defect prediction, resulting in inefficient classification models. Feature selection and data sampling are often used to overcome these problems. Feature selection is a process of choosing the most important attributes from the original data set. Data sampling alters the data set to change its balance level. Another technique, called boosting (building multiple models, with each model tuned to work better on instances misclassified by previous models), is found also effective for resolving the class imbalance problem. In particular, RUSBoost, which integrates random undersampling with AdaBoost, has been shown to improve classification performance for imbalanced training data sets. In this study, we investigated an approach for combining feature selection with this ensemble learning (boosting) process. We focused on two different scenarios: feature selection performed prior to the boosting process and feature selection performed inside the boosting process. Ten base feature ranking techniques and an ensemble ranker based on the ten were examined and compared over the two scenarios. The experimental results demonstrate that the ensemble feature ranking method generally had better or similar performance than the average of the base ranking techniques, and more importantly, the ensemble method exhibited better robustness than any other base ranking technique. As for the two scenarios, the results show that applying feature selection inside boosting performed better than using feature selection prior to boosting.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fault Detection of Anti-friction Bearing using Ensemble Machine Learning Methods

Anti-Friction Bearing (AFB) is a very important machine component and its unscheduled failure leads to cause of malfunction in wide range of rotating machinery which results in unexpected downtime and economic loss. In this paper, ensemble machine learning techniques are demonstrated for the detection of different AFB faults. Initially, statistical features were extracted from temporal vibratio...

متن کامل

Machine learning algorithms in air quality modeling

Modern studies in the field of environment science and engineering show that deterministic models struggle to capture the relationship between the concentration of atmospheric pollutants and their emission sources. The recent advances in statistical modeling based on machine learning approaches have emerged as solution to tackle these issues. It is a fact that, input variable type largely affec...

متن کامل

Combining Feature Selection and Ensemble Learning for Software Quality Estimation

High dimensionality is a major problem that affects the quality of training datasets and therefore classification models. Feature selection is frequently used to deal with this problem. The goal of feature selection is to choose the most relevant and important attributes from the raw dataset. Another major challenge to building effective classification models from binary datasets is class imbal...

متن کامل

Bridging the semantic gap for software effort estimation by hierarchical feature selection techniques

Software project management is one of the significant activates in the software development process. Software Development Effort Estimation (SDEE) is a challenging task in the software project management. SDEE is an old activity in computer industry from 1940s and has been reviewed several times. A SDEE model is appropriate if it provides the accuracy and confidence simultaneously before softwa...

متن کامل

ارائه الگوریتمی مبتنی بر یادگیری جمعی به منظور یادگیری رتبه‌بندی در بازیابی اطلاعات

Learning to rank refers to machine learning techniques for training a model in a ranking task. Learning to rank has been shown to be useful in many applications of information retrieval, natural language processing, and data mining. Learning to rank can be described by two systems: a learning system and a ranking system. The learning system takes training data as input and constructs a ranking ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Comparing Two Approaches for Adding Feature Ranking to Sampled Ensemble Learning for Software Quality Estimation

نویسندگان

چکیده

منابع مشابه

Fault Detection of Anti-friction Bearing using Ensemble Machine Learning Methods

Machine learning algorithms in air quality modeling

Combining Feature Selection and Ensemble Learning for Software Quality Estimation

Bridging the semantic gap for software effort estimation by hierarchical feature selection techniques

ارائه الگوریتمی مبتنی بر یادگیری جمعی به منظور یادگیری رتبه‌بندی در بازیابی اطلاعات

عنوان ژورنال:

اشتراک گذاری